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Abstract. Recently, several algorithms have been proposed for inde- 
pendent subspace analysis where hidden variables are i.i.d. processes. We 
show that these methods can be extended to certain AR, MA, ARMA 
and ARIMA tasks. Central to our paper is that we introduce a cascade of 
algorithms, which aims to solve these tasks without previous knowledge 
about the number and the dimensions of the hidden processes. Our claim 
is supported by numerical simulations. As a particular application, we 
search for subspaces of facial components. 

1 Introduction 

Independent Subspace Analysis (ISA), also known as Multidimensional Inde- 
pendent Component Analysis [T], is a generalization of Independent Component 
Analysis (ICA). ISA assumes that certain sources depend on each other, but the 
dependent groups of sources are still independent of each other, i.e., the indepen- 
dent groups are multidimensional. The ISA task has been subject of extensive 
research jl)2|3|4|5|6|7|8)9j . In this case, one assumes that the hidden sources 
are independent and identically distributed (i.i.d.) in time. Temporal indepen- 
dence is, however, a gross oversimplification of real sources including acoustic 
or biomedical data. One may try to overcome this problem, by assuming that 
hidden processes are, e.g., autoregressive (AR) processes. Then we arrive to the 
AR Independent Process Analysis (AR-IPA) task |10|11| . Another method to 
weaken the i.i.d. assumption is to assume moving averaging (MA). This direc- 
tion is called Blind Source Deconvolution (BSD) [12J, in this case the observation 
is a temporal mixture of the i.i.d. components. 

The AR and MA models can be generalized and one may assume ARMA 
sources instead of i.i.d. ones. As an additional step, the method can be extended 
to non-stationary integrated ARMA (ARIMA) processes, which are important, 
e.g., for modelling economic processes [T3] . 

In this paper, we formulate the AR-, MA-, ARMA-, ARIMA-IPA generaliza- 
tion of the ISA tasks, when (i) one allows for multidimensional hidden compo- 
nents and (ii) the dimensions of the hidden processes are not known. We show 
that in the undercomplete case, when the number of 'sensors' is larger than the 
number of 'sources', these tasks can be reduced to the ISA task. 
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2 Independent Subspace Analysis 



The ISA task can be formalized as follows: 

x(t) = Ae(t), where e(t) = [e 1 ^); . . . ; e M (t)] G R De (1) 

and e(t) is a vector concatenated of components e m (t) G The total dimen- 
sion of the components is D e = Y^m=i ^e 1 - We assume that for a given m, e m (t) 
is i.i.d. in time t, and sources e m jointly independent, i.e., /(e 1 , . . . ,e M ) = 0, 
where /(.) denotes the mutual information (MI) of the arguments. The dimen- 
sion of observation x is D x . Assume that D x > D ej and A G ]^ D ^ xD ^ is of 
full column rank. Under these conditions, one may assume without any loss of 
generality that both the observed (x) and the hidden (e) signals are white. For 
example, one may apply Principal Component Analysis (PCA) as a preprocess- 
ing stage. Then the ambiguities of the ISA task are as follows [14]: Sources can 
be determined up to permutation and up to orthogonal transformations within 
the subspaces. 

2.1 The ISA Separation Theorem 

We are to uncover the independent subspaces. Our task is to find a matrix 
W G R D ** D * suc h that y(t) = Wx(t), y(t) = [y 1 (t); . . . ; y M (*)] , y m = 
[yY 1 ; . . . ; y^] G R d ™ , (m = 1, . . . , M) with the condition that components y m 
are independent. Here, (i) y™ denotes the i th coordinate of the m th estimated 
subspace, and (ii) W can be chosen to be orthogonal because of the whitening 
assumption. This task can be solved by means of cost function that aims to 
minimize the mutual information between components: 

J 1 (W) = 7(y 1 ,...,y M ). (2) 

One can rewrite Ji(W) as follows: 

M 

J 2 (W)=I(y\,...,y^)-J2nvT,---,yT T )- (3) 

rn=l 

The first term of the r.h.s. is the ICA cost function; it aims to minimize mutual 
information for all coordinates. The other term is a kind of anti-ICA term; it aims 
to maximize mutual information within the subspaces. One may try to apply a 
heuristics and to optimize ^ in order: (1) Start by any 'infomax' ICA algorithm 
and minimize the first term of the r.h.s. in (|3|. (2) Apply only permutations to 
the coordinates such that they optimize the second term. In this second step 
coordinates are not changed, but Q may decrease further. Surprisingly, this 
heuristics leads to the global minimum of ([2| in many cases. In other words, in 
many cases, ICA that minimizes the first term of the r.h.s. of ([3| solves the ISA 
task apart from the grouping of the coordinates into subspaces. This feature was 
observed by Cardoso, first [lj. The extent of this feature is still an open issue. 



Nonetheless, we call it 'Separation Theorem', because for elliptically symmetric 
sources and for some other distribution types one can prove that it is rigorously 
true [15]. (See also, the result concerning local minimum points [E]). Although 
there is no proof for general sources as of yet, a number of algorithms applies 
this heuristics with success |1I4I16I17I18IT9] . 

2.2 ISA with Unknown Components 

Another issue concerns the computation of the second term of (|3| . If the d™ di- 
mensions of subspaces e m are known then one might rely on multi-dimensional 
entropy estimations |9j, but these are computationally expensive. Other methods 
deal with implicit or explicit pair- wise dependency estimations |17|16| . Interest- 
ingly, if the observations are indeed from an ICA generative model, then the 
minimization of the pair-wise dependencies is sufficient to get the solution of the 
ICA task according to the Darmois-Skitovich theorem [20]. This is not the case 
for the ISA task, however. There are ISA tasks, where the estimation of pair- wise 
dependencies is insufficient for recovering the hidden subspaces |9j. Nonetheless, 
such algorithms seem to work nicely in many practical cases. 

A further complication arises if the d™ dimensions of subspaces e m are not 
known. Then the dimension of the entropy estimation becomes uncertain. Meth- 
ods that try to apply pair-wise dependencies were proposed to this task. One 
can find a block-diagonalization method in [16], whereas [17J makes use of kernel 
estimations of the mutual information. 

Here we shall assume that the separation theorem is satisfied. We shall apply 
ICA preprocessing. This step will be followed by the estimation of the pair-wise 
mutual information of the ICA coordinates. These quantities will be considered 
as the weights of a weighted graph, the vertices of the graph being the ICA 
coordinates. We shall search for clusters of this graph. In our numerical studies, 
we make use of Kernel Canonical Correlation Analysis [5] for the MI estimation. 
A variant of the Ncut algorithm [21] is applied for clustering. As a result, the 
mutual information within (between) cluster (s) becomes large (small). 

The problem is that this ISA method requires i.i.d. hidden sources. Below, we 
show how to generalize the ISA task to more realistic sources. Finally, we solve 
this more general problem when the dimension of the subspaces is not known. 

3 ISA Generalizations 

We need the following notations: Let z stand for the time-shift operation, that 
is (zv)(t) := v(t — 1). The N order polynomials of Di x D 2 matrices are denoted 
as R[z]^ xD > := {F[z] = En=o F n^\ F n e R D ^ xD ^)}. Let V r [z] := (I - Izf 
denote the operator of the r th order difference, where I is the identity matrix, 
r > 0, r e Z. 

Now, we are to estimate unknown components e m from observed signals x. 
We always assume that e takes the form like in and that A £ ^d x xD s * s Q f 
full column rank. 



1. AR-IPA: The AR generalization of the ISA task is defined by the follow- 
ing equations: x = As, where s is an AR(p) process i.e, P[z]s = Qe, 
Q e R DsXD % and P[z] := I Da - ELi e R[z]°° xDa . We assume that 
P[z] is stable, that is det(P[z] ^ 0), for all zGC, \z\ < 1. For d™ = 1 this 
task was investigated in [10]. Case d™ > 1 is treated in [TT] . The special case 
of p = is the ISA task. 

2. MA-IPA or Blind Subspace Deconvolution (BSSD) task: The ISA task is 
generalized to blind deconvolution task (moving average task, MA(q)) as 
follows: x = Q[z]e, where Q[z] = EJ= QjZ j € R[z]° xXDe . 

3. ARMA-IPA task: The two tasks above can be merged into a model, where 
the hidden s is ARMA(p,q): x = As, P[z]s = Q[z]e. Here P[z] e R[z]° sXD % 
Q[z] e R[z]® sXDe . We assumed that P[z] is stable. Thus the ARMA process 
is stationary. 

4. ARIMA-IPA task: In practice, hidden processes s may be non-stationary. 
ARMA processes can be generalized to the non-stationary case. This gen- 
eralization is called integrated ARMA, or ARIMA(p,r,q). The assumption 
here is that the r th difference of the process is an ARMA process. The cor- 
responding IPA task is then 

x = As, where P[z]V r [z]s = Q[z]e. (4) 
4 Reduction of ARIMA-IPA to ISA 

We show how to solve the above tasks by means of ISA algorithms. We treat 
the ARIMA task. Others are special cases of this one. In what follows, we as- 
sume that: (i) P[z] is stable, (ii) the mixing matrix A is of full column rank, 
and (iii) Q[z] has left inverse. In other words, there exists a polynomial matrix 
W[z] e R[z} D * xD s such that W[z]Q[z] = I De Q 

The route of the solution is elaborated here. Let us note that differentiating 
the observation x of the ARIMA-IPA task in Eq. Q in r th order, and making 
use of the relation zx = A(zs), the following holds: 

V r [z]x = A (V r [z]s) , and P[z] (V r [z]s) = Q[z]e. (5) 

That is taking V r [z]x as observations, one ends up with an ARMA-IPA task. 
Assume that D x > D e (undercomplete case). We call this task uARMA-IPA. 
Now we show how to transform the u ARMA-IPA task to ISA. The method is 
similar to that of [23J where it was applied for BSD. 

Theorem // the above assumptions are fulfilled then in the uARMA-IPA task, 
observation process x(t) is autoregressive and its innovation x(t) := x(t) — 
2£[x(t)|x(t — 1), x(t — 2), . . .] = AQoe(£) ; where E[-\-] denotes the conditional ex- 
pectation value. Consequently, there is a polynomial matrix War[z] G R[z] DxXDx 
such that War[^]x = AQoe. 

1 One can show for D s > D e that under mild conditions Q[z]-has an inverse with 
probability 1 [22]; e.g., when the matrix [Qo, . . . , Q q ] is drawn from a continuous 
distribution. 



Proof Steps of the proof: 

1. In the uARMA-IPA task the following equations hold: 



P[*]s = Q[z]e, (6) 
x = As, (7) 



i=l j=0 

x(t) = As(t). (9) 

Non- degenerate linear transformation of an ARM A process is also ARM A. 
Thus, observation process x is an ARM A process. Formally: Substituting s(t) 
of Eq. ^ into Eq. (|9| and then using the pseudoinverse of matrix A and 
expression s(t) = A x(t) that follows from Eq. (J9| ; we have 

v q 
x(t) = APiA _1 x(t - i) + ^ A Qj e (* - i)- ( 10 ) 

Process e(t) is i.i.d, so the process x(t) is ARMA. 
2. We assumed that Q[z] has left inverse and thus e of Eq. ([6| can be expressed 
from s via multiplication with a polynomial matrix. One says that e derives 
from s by causal FIR filtering. The same holds for x because of Eq. Q : 

e = P[z}s = P / [z](A- 1 x) = (P^A-^x =: P"[z]x, (11) 

where P>] := Q-^PM = ££L P n*" n e xi \ P"[z] G R[z]^ xD - 

anc? TV denotes the degree of the polynomials. 



3. The first term of the r.h.s. of the observation x in Eq. (10) is a linear expres 



sion of a finite history of x. Equation (11) implies, that the second term, 
except AQ e(t) ; also belongs to the linear hull of the finite history of x. 
Formally: 

p q 
x(t) = AQoe(t) + J) APiA-^t - i) + ^AQ J (P"[z]x)(t - j) (12) 

i=l j=l 

e AQ e(t) + (x(f - 1), x(t - 2), . . . , x(* - max(p, g + A/"))) . (13) 

^. e(t) is independent of (x(t — l),x(£ — 2), . . . , x(t — max(p, g + A/"))). Conse- 
quently, observation process x(t) is autoregressive with innovation AQ e(t). 

Thus, AR fit of x(t) can be used for the estimation of AQ e(t). This innova- 
tion corresponds to the observation of an undercomplete ISA model (D x > 

2 Assumptions made for Q[z] and A in the uARMA-IPA task implies that AQo is of 
full column rank and thus the resulting ISA task is well defined. 



which can be reduced to a complete ISA (D x = D e ) using PC A. Finally, the 
solution can be finished by any ISA procedure. The reduction procedure im- 
plies that hidden components e m can be recovered only up to the ambiguities 
of the ISA task: components of (identical dimensions) can be recovered only up 
to permutations. Within each subspaces, unambiguity is warranted only up to 
orthogonal transformations. 

The steps of our algorithm are summarized in Table [I] 



Table 1: Pseudocode of the undercomplete ARIMA-IPA algorithm 



Input of the algorithm 

Observation: {x(t)}t = i,...,T 
Optimization 

Differentiating: for observation x calculate x* = V r [z]x 

AR fit: for x* estimate War[z] 

Estimate innovation: x = War[z]x* 

Reduce uISA to ISA and whiten: x = Wpcax 

Apply ICA for x : e* = Wicax 

Estimate pairwise dependency e.g., as in |17] on e* 
Cluster e* by Ncut: the permutation matrix is V 
Estimation 

Warima-ipa [z] = PWicaWpcaWarMVI^] 
e - Warima-ipa [z]x 



5 Results 



In this section we demonstrate the theoretical results by numerical simulations. 



5.1 ARIMA Processes 

We created a database for the demonstration: Hidden sources e m are 4 pieces 
of 2D, 3 pieces of 3D, 2 pieces of 4D and 1 piece of 5D stochastic variables, 
i.e., M = 10. These stochastic variables are independent, but the coordinates of 
each stochastic variable e m depend on each other. They form a 30 dimensional 
space together (D e = 30). For the sake of illustration, 3D (2D) sources emit ran- 
dom samples of uniform distributions defined on different 3D geometrical forms 



(letters of the alphabet). The distributions are depicted in Fig. [la| (Fig. lb). 
30,000 samples were drawn from the sources and they were used to drive an 
ARIMA (2, 1,6) process defined by Q. Matrix A G R 60 * 60 was randomly gener- 
ated and orthogonal. We also generated polynomial Q[z] G IR^ * 30 and stable 
polynomial P[z] G R[z]^ 0x60 randomly. The visualization of the 60 dimensional 



process is hard to illustrate: a typical 3D projection is shown in Fig. lc The 



task is to estimate original sources e m using these non-stationary observations. 



Mi 



order differencing of the observed ARIMA process gives rise to an ARMA 



process. Typical 3D projection of this ARMA process is shown Fig.[Tdj Now, one 
can execute the other steps of Table [I] and these steps provide the estimations 
of the hidden components e m . Estimations of the 3D (2D) components are pro- 
vided in Fig. [le] (Fig. [Tf|) . In the ideal case, the product of matrix AQ and the 
matrices provided by PCA and ISA, i.e., G := (PWicaWpca)AQq G 



r>D e xD e 



is a block permutation matrix made of d™ x d™ blocks. This is shown in Fig. lg 




(e) 



A B C D 

(b) 

(f) 



(c) 



(d) 




Fig. 1: (a-b) components of the database, (a): 3 pieces of 3D geometrical forms, 
(b): 4 pieces of 2D letters. Hidden sources are uniformly distributed variables 
on these objects, (c): typical 3D projection of the observation, (d): typical 3D 
projection of the r th -order difference of the observation, (e): estimated 2D com- 
ponents, (f): estimated 3D components, (g): Hinton diagram of G, which - in 
case of perfect estimation - becomes a block permutation matrix. 



5.2 Facial Components 

We have generated another database using the FaceGerj^] animation software. 
In our database we had 800 different front view faces with the 6 basic facial 
expressions. We had thus 4,800 images in total. All images were sized to 40 x 40 
pixel. Figure 2a shows samples of the database. A large X G ]^ 4800x1600 matrix 
was compiled; rows of this matrix were 1600 dimensional vectors formed by the 
pixel values of the individual images. The columns of this matrix were considered 
as mixed signals. This treatment replicates the experiments in [24]: Bartlett 
et al., have shown that in such cases, undercomplete ICA finds components 
resembling to what humans consider facial components. We were interested in 
seeing the components grouped by undercomplete ISA algorithm. The observed 
4800 dimensional signals were compressed by PCA to 60 dimensions and we 
searched for 4 pieces of ISA subspaces using the algorithm detailed in Table [I] 
The 4 subspaces that our algorithm found are shown in Fig. [2b] As it can be 
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seen, the 4 subspaces embrace facial components which correspond mostly to 
mouth, eye brushes, facial profiles, and eyes, respectively. 




(b) 

Fig. 2: (a) Samples from the database, (b) Four subspaces of the components. 
Components in distinct groups correspond mostly to mouth, eye brushes, facial 
profiles, and eyes respectively. 



6 Conclusions 

We have extended the ISA task to problems where the hidden components can 
be AR, MA, ARM A, or ARIMA processes. We showed an algorithm that can 
identify the hidden subspaces under certain conditions. The algorithm does not 
require previous knowledge about the dimensions of the subspaces. The working 
of the algorithm was demonstrated on an artificially generated ARIMA process, 
as well as on a database of facial expressions. 
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